AITopics | momentum encoder

Representation Learning via Consistent Assignment of Views over Random Partitions

Neural Information Processing SystemsFeb-15-2026, 10:42:37 GMT

CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments.

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Brazil (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback

BMU-MoCo: BidirectionalMomentumUpdate forContinualVideo-LanguageModeling

Neural Information Processing SystemsFeb-10-2026, 17:59:17 GMT

Different from the original MoCo [19] and its cross-modal versions [15, 33, 35] that utilize momentum update for only momentum encoders to maintain a large consistent queue, our BMU strategy imposes momentum update on both momentum encoders and (video/text) encoders.

artificial intelligence, encoder, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Neural Information Processing SystemsDec-24-2025, 19:06:36 GMT

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information (e.g., memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges (i.e., catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders (i.e., bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders.

bidirectional momentum update, bmu-moco, continual video-language modeling, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources

Pham, Phuc, Pham, Nhu, Ly, Ngoc Quoc

arXiv.org Artificial IntelligenceDec-3-2025

In medical healthcare, obtaining detailed annotations is challenging, highlighting the need for robust Vision-Language Models (VLMs). Pretrained VLMs enable fine-tuning on small datasets or zero-shot inference, achieving performance comparable to task-specific models. Contrastive learning (CL) is a key paradigm for training VLMs but inherently requires large batch sizes for effective learning, making it computationally demanding and often limited to well-resourced institutions. Moreover, with limited data in healthcare, it is important to prioritize knowledge extraction from both data and models during training to improve performance. Therefore, we focus on leveraging the momentum method combined with distillation to simultaneously address computational efficiency and knowledge exploitation. Our contributions can be summarized as follows: (1) leveraging momentum self-distillation to enhance multimodal learning, and (2) integrating momentum mechanisms with gradient accumulation to enlarge the effective batch size without increasing resource consumption. Our method attains competitive performance with state-of-the-art (SOTA) approaches in zero-shot classification, while providing a substantial boost in the few-shot adaption, achieving over 90% AUC-ROC and improving retrieval tasks by 2-3%. Importantly, our method achieves high training efficiency with a single GPU while maintaining reasonable training time. Our approach aims to advance efficient multimodal learning by reducing resource requirements while improving performance over SOTA methods. The implementation of our method is available at https://github.com/phphuc612/MSD .

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.02438

Country: Asia > Vietnam (0.29)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

7caf9d251b546bc78078b35b4a6f3b7e-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 23:14:59 GMT

artificial intelligence, machine learning, representation, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Brazil (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(3 more...)

Add feedback

Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops

Scardecchia, Mattia

arXiv.org Artificial IntelligenceOct-7-2025

Recent advances in self-supervised learning (SSL) have made it possible to learn general-purpose visual features that capture both the high level semantics and the fine-grained spatial structure of images. Most notably, the recent DINOv2 has established a new state of the art by surpassing weakly supervised methods (WSL) like OpenCLIP on most benchmarks. In this survey, we examine the core ideas behind its approach, multi-crop view augmentation and self-distillation with a mean teacher, and trace their development in previous work. W e then compare the performance of DINO and DINOv2 with other SSL and WSL methods across various downstream tasks, and highlight some remarkable emergent properties of their learned features with transformer backbones. W e conclude by briefly discussing DINOv2's limitations, its impact, and future research directions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.03606

Country:

Europe (0.28)
Asia (0.28)

Genre: Overview (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling Yizhao Gao

Neural Information Processing SystemsAug-16-2025, 23:20:54 GMT

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.42)

Add feedback

Review for NeurIPS paper: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Neural Information Processing SystemsJan-25-2025, 15:59:06 GMT

Weaknesses: The paper has many weak points unfortunately. They are presented below as separate categories. Intro/Motivation: The paper focuses too much on "not using momentum encoder", "not using memory bank". All these are largely irrelevant points. Firstly, until one shows one gets no benefit from momentum encoder, it is best not to claim that "not having momentum" is a contribution / a positive aspect of the model.

augmentation, data augmentation, swav, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.50)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.40)

Add feedback

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Neural Information Processing SystemsJan-17-2025, 18:41:28 GMT

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information (e.g., memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges (i.e., catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders (i.e., bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders.

bidirectional momentum update, bmu-moco, continual video-language modeling, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.64)

Add feedback

Frequency-Masked Embedding Inference: A Non-Contrastive Approach for Time Series Representation Learning

Fu, En, Hu, Yanyan

arXiv.org Artificial IntelligenceJan-6-2025

Contrastive learning underpins most current self-supervised time series representation methods. The strategy for constructing positive and negative sample pairs significantly affects the final representation quality. However, due to the continuous nature of time series semantics, the modeling approach of contrastive learning struggles to accommodate the characteristics of time series data. This results in issues such as difficulties in constructing hard negative samples and the potential introduction of inappropriate biases during positive sample construction. Although some recent works have developed several scientific strategies for constructing positive and negative sample pairs with improved effectiveness, they remain constrained by the contrastive learning framework. To fundamentally overcome the limitations of contrastive learning, this paper introduces Frequency-masked Embedding Inference (FEI), a novel non-contrastive method that completely eliminates the need for positive and negative samples. The proposed FEI constructs 2 inference branches based on a prompting strategy: 1) Using frequency masking as prompts to infer the embedding representation of the target series with missing frequency bands in the embedding space, and 2) Using the target series as prompts to infer its frequency masking embedding. In this way, FEI enables continuous semantic relationship modeling for time series. Experiments on 8 widely used time series datasets for classification and regression tasks, using linear evaluation and end-to-end fine-tuning, show that FEI significantly outperforms existing contrastive-based methods in terms of generalization. This study provides new insights into self-supervised representation learning for time series. The code is available at https://github.com/USTBInnovationPark/Frequency-masked-Embedding-Inference.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.2079

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Filters

Collaborating Authors

momentum encoder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Representation Learning via Consistent Assignment of Views over Random Partitions

BMU-MoCo: BidirectionalMomentumUpdate forContinualVideo-LanguageModeling

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Boosting Medical Vision-Language Pretraining via Momentum Self-Distillation under Limited Computing Resources

7caf9d251b546bc78078b35b4a6f3b7e-Paper-Conference.pdf

Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling Yizhao Gao

Review for NeurIPS paper: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Frequency-Masked Embedding Inference: A Non-Contrastive Approach for Time Series Representation Learning